An approach to similarity measurement of absence-presence data: the case that common zeros matter

نویسندگان

  • Leo Egghe
  • Ronald Rousseau
چکیده

Similarity between objects (documents, persons, answers to a questionnaire, etc.) is generally determined through relations between representations of these objects. In the case of binary representations the presence of a properly (e.g., an index term) carries a weight of one, the absence a weight of zero. In many similarity studies common zeros are ignored. This situation is called the zero insensitive case. In this article, however, we study the zero sensitive case. Clearly, answers to binary questionnaires (yes-no, encoded as 1-0) are zero sensitive, as people who answer 'no' to the same questions are more similar. We present a wish list for such a zero sensitive approach to similarity. Making a difference between common zeros and common ones leads to an 'identitysimilarity' theory. Hence, we move beyond a pure similarity theory. Three approaches to the problem of similarity measurement of presence-absence data, where common zeros matter are presented. In each case a coding approach is used, leading to new representations, which then lead to a similarity ranking. Examples of functions respecting these rankings are given.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Measurement of Decision Making Units with Network Structure in the Presence of Undesirable Output

In the performance evaluation process, using the classic data envelopment analysis (DEA) models, decision making units (DMUs) are considered as black boxes. While in many cases and different applications such as investment funds, banks, insurance companies, etc., DMUs have a network structure. In addition, in many network structures, some of the indicators used to calculate the efficiency...

متن کامل

Performance analysis in production processes in the presence of fixed-sum outputs

Performance measurement in the presence of fixed-sum outputs in data envelopment analysis (DEA) is an interesting and most frequently studied subject in the field of operations research. Different definitions of relative efficiency in the presence of fixed-sum outputs have been proposed in the literature of data envelopment analysis and in all of the existing definitions a common equilibrium ef...

متن کامل

Deriving Common Set of Weights in the Presence of the Undesirable Inputs: A DEA based Approach

Data Envelopment Analysis (DEA) as a non-parametric method for efficiency measurement allows decision making units (DMUs) to select the most advantageous weight factors in order to maximize their efficiency scores.  In most practical applications of DEA presented in the literature, the presented models assume that all inputs are fully desirable. However, in many real situations undesirable inpu...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

Technical Note: Performance measurement in industrial organizations, case study: Zarbal Complex

Industrial organizations are complex systems` where the interactions among the various functions such as Sales, Distribution, Manufacturing, Materials, Finance, Human Resources and Maintenance have to be man-aged towards a common purpose of delivering the customers satisfaction. However, since most of these or-ganizations have a `Functional Structure`, each function or department works towards ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Information Science

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2004